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Abstract 

In medical domain, data features often contain missing values. This can create 
serious bias in the predictive modeling. Typical standard data mining methods often 
produce poor performance measures. In this paper, we propose a new method to 
simultaneously classify large datasets and reduce the effects of missing values. The 
proposed method is based on a multilevel framework of the cost-sensitive SVM and 
the expected maximization imputation method for missing values, which relies on 
iterated regression analyses. We compare classification results of multilevel SVM-based 
algorithms on public benchmark datasets with imbalanced classes and missing values 
as well as real data in health applications, and show that our multilevel SVM-based 
method produces fast, and more accurate and robust classification results. 


1 The role of predictive modeling in healthcare 

Modern healthcare can be characterized as evidence-driven and model-assisted [T]. In an 
ideal situation, every decision in the clinical environment should be supported by a statistical 
model predicting risks and positive outcomes. This model may have a form of a simplihed 
risk-assessment formula |2], or a sophisticated machine learning tool 0 . 0 . In either case, 
it is based on a query of relevant clinical and operational history. 

In practice, comprehensive medical information is stored in multiple databases, with 
different formats and rules of access. Due to considerations of patient privacy, and the 
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proprietary nature of electronic medical records [3], the databases cannot be queried con¬ 
tinuously. Every instance of data acquisition and integration is a separate effort that is 
cost-effective only when the resulting predictive model shows high quality. Thus, progress 
in evidence-driven healthcare depends on how well state-of-the art algorithms of machine 
learning are adapted to clinical data. 

We note that classical computer science issues, such as scalability, or convergence rate 
are rarely a major issue for healthcare applications. Instead, an algorithm is ranked based on 
its ability to process raw medical data, with such problematic features as sparsity, missing 
entries, noise and imbalanced outputs. Because of the encounter nature of patient-provider 
interaction, medical data is inherently sparse: when a clinical encounter occurs, the number 
of and contents of labels attached to it vary widely |B]; outside of an encounter, the state 
of the patient is unknown. The outcomes of interest in classihcation problems are imbal¬ 
anced, because, as a rule, healthcare analytics is motivated by rare events such as healthcare 
emergencies, severe chronic conditions, gaps and bottlenecks in access to care. The extent 
to which medical data is problematic may not be obvious from the perspective of a local 
healthcare provider (such as a single doctor); by dehnition they have access to all knowledge 
they ever use. We view this paper as a short, high-level primer on using advanced meth¬ 
ods of machine learning to overcome the difficulties that emerge after multiple datasets are 
integrated for analysis and prediction. 

This work was prompted by several projects completed with the Division of Applied 
Research and Clinical Informatics, Dept, of Data Science; Geisinger Health System. The 
routine activity of Data Science consists of medium-scope predictive projects on a combina¬ 
tion of patient biometrics, pathology lab results, clinical encounter data, medical insurance 
data (available directly from Geisinger Health Plan) and externally assigned aggregate met¬ 
rics for patients’ general lifestyle risks, compliance with treatment regime, and loyalty to a 
particular provider. 

For the first motivating example (Example 1), we use our 2014 feasibility study [7] of 
merging insurance information (6 aggregate features, based on the history of claims and 
payments) together with clinical encounter information (10-20 features chosen by hand from 
patient biometrics, medications and diagnostic codes). The goal of the initial study was to 
predict the financial risk for a particular patient (a common metric in insurance practice, 
derived as a ratio of individual expenses and average expenses for a large demographic 
group). Furthermore, we wanted to see how addition of the clinical information changes the 
predictive power of the model, thus making a case for existence of high-risk patients that 
are invisible to claims-based analysis. We used a standard clustering technique, k-nearest 
neighbors with empirically selected weighting, to achieve the basic results. 

For Example 2, we use our preliminary investigation of patients’ response to public out¬ 
reach |S], such as annual flu awareness campaigns. We included basic demographic and 
clinical information on patients targeted by 35 such campaigns into the model predicting 
whether a given patient is likely to respond to the reminder, or to choose not to get vacci¬ 
nated, or use a different provider. Again, our core predictive model was standard: logistic 
regression with empirically selected weighting of training data. 

We now pose a question: how much more effective would the predictive models be in 
each case with the use of an advanced machine learning algorithm developed with awareness 
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of sparsity and class skewness (imbalance) in data? 


2 Support Vector Machine Algorithms for Medical Data 

Given a training set J = {{xi,yi)}\^i, that is a set of data points with known labels, where 
{xi,yi) G and I and n are the numbers of data points and features, respectively, 

and yi G { — 1,1} denotes the class label for each data point i in J. We denote by C“ and 
C"*", the ’’majority” (points with y^ = +1) and “minority” (points with y^ = —1) classes 
respectively such that J = U C~. 

2.1 Support Vector Machines 

The support vector machine (SVM) solves the following max-margin problem: 

1 ^ 

min - \\w\\‘^ + C'^^i (la) 

i=l 

s.t. yi{w'^(j){xi)+ b) > 1 - i = (lb) 

> 0 i = l,...,/ (Ic) 

where the optimal margin is dehned by the parameters w and b. The training data points Xi 
are mapped into a higher dimensional space through function 0 : M” —)■ M"* {m > n). The 
misclassihed points are penalized using the term slack variables {i G {1,..., Z}) and the 
parameter G > 0 controls the magnitude of penalization. Hence, this formulation is called 
as soft margin SVM. The primal formulation is usually transformed to the Lagrangian dual 
problem using different algorithms. One of the most popular is the sequential minimal 
optimization (SMO) which is implemented in the LIBSVM tool [9], since it is fast and yields 
reliable convergence. 

2.2 Weighted Support Vector Machines 

A cost-sensitive extension of SVM, developed to cope with imbalanced data, is known as 
weighted SVM (WSVM) [10]. The main idea is to consider weighting scheme in learning such 
that the WSVM algorithm builds the decision hyperplane based on the relative contribution 
of data points in training. In contrast to the standard SVM, the penalization costs are 
different for the positive (G+) and negative (G“) classes: 

^ n+ n- 

min ^ ii + C~ ^ (2a) 

{i\yi=+l} {j\yj = -l} 

S.t. yi{w'^ (t){xi) + b) > I - « = !,...,/ (2b) 

ei>0 1 = 1,...,I (2c) 

The formulations ([^ and ([^ are solved through the Karush-Kuhn-Tucker conditions. The 
Gaussian kernel function (radial basis function, RBF) is used in the dual formulation of 
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(W)SVMs since this kernel function usually results into superior performance for many 
classification problems Parameter tuning is required to set optimal or near optimal 

C, C^, C~, and kernel function parameters (e.g. bandwidth parameter for RBF kernel 
function) to achieve good results for (W)SVM. This process becomes problematic and time- 
consuming particularly when the size of data is very large. Hence we aim to develop an 
efficient and effective classification method, called the Multilevel (W)SVM, that is scalable 
and works with imbalanced healthcare data. 

2.3 Multilevel Support Vector Machines 

The proposed algorithm belongs to the family of multilevel optimization strategies |T3] whose 
goal is to approximate the system at multiple scales of coarseness and to obtain a final solu¬ 
tion by combining the information from different scales. The multilevel framework for SVM 
na scales efficiently for large classification problems whose hierarchy of coarser representa¬ 
tions is constructed based on the approximated /c-nearest neighbors graphs (AfcNN). This 
method consists of three main phases: 

• The coarsening phase. A gradual coarsening of the training set is constructed using 
fast point selection method [15] in A/cNN graph. However, we found that ensuring 
a uniform coverage of the points can lead to much better results than finding an 
independent set of points (nodes in A/cNN) as was suggested in [TB]. Thus, we extended 
the set of coarse points by setting a parameter for the minimum number of points that 
in our experiments was set to 50% of the fine data points. 

• Supervised support vector initial learning. After the hierarchy is created, the 
support vectors learning is performed at the coarsest level, where the number of data 
points is sufficiently small. 

• The uncoarsening phase. Support vectors, and classifier are projected throughout 
the hierarchy from the coarsest to the finest levels. At each level, the solution to the 
current fine level is updated and optimized based on the solution of the previous coarse 
level. The locally optimal support vectors are obtained by gradual refinement of the 
projected support vectors from the coarse level. 

For imbalanced data, the WSVM can easily be adopted as the base classifier for multilevel 
framework (MLWSVM). The regular SVM does not perform well on imbalanced data because 
it tends to train models with respect to the majority class and technically ignores the 
minority class. However, the effect of imbalanced issue decreases while using multilevel 
framework since we prevent creating very small coarse sets for the minority class even if the 
majority class can still be coarsened. 

Often, methods for imbalanced classification demonstrate poor performance on data with 
missing values (such as [121) that is a frequent situation in healthcare data. Therefore, we ap¬ 
ply imputation methods prior the classification model. Such imputation methods have been 
well studied in statistical analysis and machine learning domains [nHH]. Problems with 
missing data can be categorized into three types: data is completely at random (MCAR), 
missing at random (MAR), and not missing at random (NMAR). MCAR occurs while any 
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feature of a data instance is missing completely random and is independent of the values of 
other features. Data is MAR , when the data instance with missing feature is dependent 
on the value of one or more of the instances other features. NMAR occurs when the data 
instance with missing feature is dependent on the value of the other missing features. Even 
though MCAR is more desirable, in many real-world problems, MAR occurs frequently in 
practice im. 

In the imputation methods, the goal is to substitute a missing value with a meaningful 
estimation pni- This can be done either directly from the information on the dataset or 
by constructing a predictive model for this purpose. Standard methods for imputation are 
mean imputation [22], kNN imputation [23], Bayesian principal component analysis (BPCA) 
imputation [21], and the expectation maximization (EM) [23]. We apply the EM method 
which is one of the most successful imputation methods [2S]- The EM method iteratively 
applies linear regression analysis and hts a new linear to the estimated data until a local 
optimum is achieved [251127]. In the regularized adaption of EM method, the conditional 
maximum likelihood estimation of regression parameters is replaced in the conventional EM 
algorithm [23] . 

2.4 Regularized Expectation-Maximization 

In our preprocessing when the data contain many missing values, we apply the EM algorithm. 
It iteratively calculates the maximum-likelihood (ML) estimates of parameters by exploring 
the relationship between the complete data and the incomplete data (with missing features) 
[29] . In many cases, it has been demonstrated that the EM algorithm achieves reliable 
global convergence, economical storage. It is not computationally expensive, and can be 
easily implemented [30]. In EM we maximize the objective of the log-likelihood function 
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where y = {xi\i = l,...,n} are the observations with independent distribuation p{x) 
parametrized by 0 . 

The regularized EM algorithm (REM) is developed to control the level of uncertainty 
associated to missing values [31]. The main idea is to regularize the likelihood function 
according to the mutual relationship between the observations and the missing data with 
little uncertainty and maximum information. Intuitively, it is desirable to select the missing 
data that has a high probabilistic association with the observations, which shows that there 
is little uncertainty on the missing data given the observations. It performs linear regression 
iteratively for the imputation of missing values. The REM algorithm optimizes the penalized 
likelihood as follows: 


L(0;y) = L(0;y)+7P(y,T|0), 


(4a) 


where P is the distribution function of the complete data given 0. The trade-off between the 
degree of regularization of the solution and the likelihood function is controlled by the so- 
called regularization parameter that is represented by 7 that [31]. In addition to reducing 
the uncertainty of missing data, the REM preserves the advantage of the standard EM 
method. This method is very efficient for over-complicated models. 
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Table 1: Confusion matrix 



Positive class 

Negative Class 

Positive Class 

True Positive 

(TP) 

False Positive 

(FP) 

Negative Class 

False Negative 
(FN) 

True Negative 
(TN) 


2.5 Performance Measures 

Classification algorithms are evaluated based on the performance measures, which are cal¬ 
culated from the confusion matrix ([^. For binary classihcation problems, the performance 
measures are dehned as accuracy (ACC), sensitivity (SN), specihcity (SP), and G-mean, 
namely. 


TP TN 

SN — SP — 

TP + FN' TN + FP 

(5) 

G-mean = a/SP * SN 

(6) 

TP + TAT 

FP + TN + TP + FN' 

(7) 


3 Computational Results 

We evaluate the proposed classihcation framework on academic (UCI [32], and the cod-rna 
dataset [33]), and real-life binary classihcation benchmarks 0,0. Coarsest and rehnement 
(W)SVM models are solved using LIBSVM-3.18 [9], and the FLANN library [3l] is used to 
create the A/cNN graphs. Multilevel frameworks, data processing and further scripting are 
implemented in MATLAB 2012a. The C4.5, Naive Bayes (NB), Logistic Regression (LR), 
and 5-Nearest Neighbor (5NN) are implemented using WEKA interfaced with MATLAB. A 
typical 10-fold cross validation setup is used. We create missing values on the academic data 
training sets by discarding the features randomly. The misclassihcation penalty or weights 
are selected as inversely proportional to the size of each class in our implementation. As a 
preprocessing step, the whole data is normalized before classihcation. The nested uniform 
design (UD) is performed on the training data as the model selection for (W)SVM [35] . 
The UD methodology is very successful for model selection in supervised learning [36] . The 
close-to-optimal parameter set is achieved in an iterative nested process [35]. The optimal 
parameter set is selected based on G-mean maximization, since data might be imbalanced. 
A 9- and 5-point run design is performed for the hrst and second stages of the nested UD due 
to its superiority for the UCI data [3S], and the performance measures such as sensitivity, 
specihcity, G-mean and accuracy are calculated on the testing data. 

3.1 Academic data sets 

We compared popular methods with the proposed ML(W)SVM to classify imperfect data. 
Table shows the comparative results of MLSVM, MLWSVM, SVM, WSVM, Naive Bayes, 
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Table 2: Academic data sets. 

r- , n . I Tl lr! + 


Dataset 

^ imb 


\J\ 

|C+| 

|c-| 

Twonorm 

0.50 

20 

7400 

3703 

3697 

Letter26 

0.96 

16 

20000 

734 

19266 

Ringnorm 

0.50 

20 

7400 

3664 

3736 

Cod-rna 

0.67 

8 

59535 

19845 

39690 

Clean (Musk) 

0.85 

166 

6598 

1017 

5581 

Advertisement 

0.86 

1558 

3279 

459 

2820 

Nursery 

0.67 

8 

12960 

4320 

8640 

Hypothyroid 

0.94 

21 

3919 

240 

3679 

Buzz 

0.80 

77 

140707 

27775 

112932 


C4.5, LR, and 5NN algorithms for academic data sets. These methods are examined for 
different missing valne ratios selected as 5%, 10%, 20%, and 40%. We implemented the 
REM method for missing data impntation [25]. The highest valnes are shown in boldface 
among all methods for their related missing valne levels. It is clear from the accnmnlation of 
boldface resnlts, MLWSVM and WSVM perform better than the other methods in general 
for all missing vanle ratios. In fact, MLWSVM and WSVM resnlts into higher G-mean 
valnes in 19 ont of 36 dataset/rmu combinations followed by MLSVM and SVM with 13 ont 
of 36. Moreover, the ML(W)SVM techniqnes achieve faster compntational time compared 
to the standard (W)SVM (Table |^. 

3.2 Healthcare data sets 

We present the resnlts of comparison of classihcation algorithms on the real-life healthcare 
data sets. Table demonstrates the resnlts on Example 1 (see Section [^, a classihcation 
task of assigning a patient in a correct gronp by hnancial risk, which are ordered in ascending 
manner from gronp 1 with the lowest level of risk, to gronp 5 with the highest level of risk. 

The motivation behind the original stndy was to determine how mnch integration of the 
medical and hnancial data changes the ontcomes of clnstering and classihcation operations 
based on hnancial data alone. For that pnrpose, modest precision was snfhcient; we nsed 
a logistic (linear) regression (LR) approach (implemented as mnrfit in MATLAB). We are 
comparing the accnracy of it with the best resnlts obtained by ML(W)SVM. The strategy 
”one-against-all” is nsed for mnlti-class classihcation. This strategy performs training a 
classiher per class with the data points of that class as positive class and the rest of the data 
points are trained as negative class. 

To interpret the resnlts, we note that correct identihcation of intermediate risk categories 
is a very difficult problem in medical informatics. To our knowledge, there is no good 
dehnition of ’’average health”, either evidence-driven or philosophical, that would help an 
expert to identify such patient features that do not indicate an acute crisis, or an almost 
certain safety from crisis. Accordingly, it is not surprising that neither approach does well on 
the risk categories 2..4; there is also not a lot of motivation to improve the model there. On 
the other hand, it is important to identify and predict the very low-risk patients (knowing 
that status ahead of time allows resource re-allocation leading to savings and improved 
service for everyone) and the very high-risk patients (so that clinical and hnancial resources 
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could be prepared for the forthcoming crisis). 

Accordingly, it is important that the use of an advanced method of machine learning 
changes the quality of prediction from almost worthless (’toss a coin’) to workable (accuracy 
of 0.7). 

In Table we compare results for the widely used basic approach and ML(W)SVM 
prediction for Example 2 (see Section [^, a study of patient’s response to hospital flu out¬ 
reach. In this problem, the goal is to hnd a binary classiher that will predict whether the 
patient will get vaccinated after reminder, or not (this includes using a different provider for 
vaccination). In the preliminary study, we used adaptive linear regression model (LASSO 
for adaptive selection of features, logistic regression on actual prediction). 

Response to outreach is not a crucial life-or-death issue, we are performing this study 
to see if predictive modeling can assist with resource allocation (which patients to contact, 
how much medical personnel effort to dedicate to outreach and then vaccination). Arguably, 
accuracy is more important than specihcity here. Even the basic results (using linear regres¬ 
sion) were met with approval the CPSL (Care Patient Service Line: a division responsible 
for coordinating efforts of local, small-scale healthcare providers operating under Geisinger). 
SVM methods (almost 10 percent improvement) provide additional justihcation for the use 
of machine learning on merged data to assist planning in clinical practice. 

4 Conclusion 

Large-scale data, missing or imperfect features, skewness distribution of classes are com¬ 
mon challenges in pattern recognition of many healthcare problems. We have successfully 
extended a powerful machine learning technique, support vector machines, to the scalable 
multilevel framework of cost-sensitive learning SVM to deal with imbalanced classihcation 
problems. Our multilevel framework substantially improves the computational time with¬ 
out losing the quality of classihers for large-scale datasets. We have shown that MLWSVM 
produces superior results than MLSVM and the regular SVM methods in most cases. This 
work can be extended to tackle other classihcation problems with large-scale imbalanced 
data (combined from different sources) with missing features in healthcare and engineering 
applications. 

From the perspective of evidence-driven healthcare, our work shows that application 
of cutting edge machine learning techniques (in this case, fast multilevel classihers) makes 
enough of a diherence to justify the additional development ehort for typical examples 
from clinical practice. While the improvements in precision and specihcity we show in this 
study are both under 10% and are modest in general perspective, the result in healthcare is 
signihcant. 

To our knowledge, such complex combined behavioral/operational phenomena as infer¬ 
ence of hnancial risk from medical history (Example 1), or prediction of ehectiveness of 
public outreach (Example 2), don’t have a satisfactory casual explanation. The classical 
(1990s) clinical practice ohered two equally unsatisfactory options: not having a capability 
for prediction at all, or relying on very basic statistical techniques (based on a single data 
source, with very high rate of false-positive classihcation outcomes). The existing mature 
models (such as actuarial projections of hnancial risk) do not beneht from integration of 


data from multiple sources, and may, in fact, turn out to be ineffective outside of their scope 
in patient population and metrics of interest (as we have shown in Cl)- Thus, in the modern 
clinical practice we have to rely on newly developed machine learning tools, tnned on data 
from mnltiple sonrces. Thns, onr work can also be extended to handle other classihcation 
problems on massive, mnlti-format medical data. 
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Table 3: Comparative G-mean results for ML(W)SVM against the regular SVM, WSVM, 
NB, C4.5, 5NN, and LR on academic datasets for different fractions of missing values (rmv)- 


Dataset 

^ mv 

MLSVM 

MLWSVM 

SVM 

WSVM 

C4.5 

5NN 

NB 

LR 


5 % 

0.98 

0.98 

0.98 

0.98 

0.86 

0.97 

0.98 

0.98 

Twonorm 

10 % 

0.98 

0.98 

0.97 

0.97 

0.87 

0.97 

0.97 

0.97 

20 % 

0.98 

0.98 

0.98 

0.98 

0.88 

0.97 

0.97 

0.98 


40% 

0.97 

0.97 

0.97 

0.97 

0.89 

0.97 

0.98 

0.98 


5% 

0.97 

1.00 

0.99 

0.99 

0.97 

0.98 

0.86 

0.81 

Letter 

10% 

0.98 

1.00 

0.98 

0.99 

0.98 

0.98 

0.86 

0.80 

20% 

1.00 

1.00 

0.99 

0.99 

0.97 

0.98 

0.87 

0.80 


40% 

0.95 

0.97 

0.96 

0.99 

0.97 

0.98 

0.88 

0.83 


5% 

0.97 

0.98 

0.97 

0.98 

0.91 

0.61 

0.99 

0.76 

Ringorm 

10% 

0.98 

0.98 

0.99 

0.99 

0.91 

0.62 

0.98 

0.76 

20% 

0.98 

0.98 

0.97 

0.98 

0.91 

0.62 

0.98 

0.76 


40% 

0.98 

0.98 

0.97 

0.98 

0.91 

0.62 

0.98 

0.76 


5% 

0.95 

0.96 

0.96 

0.96 

0.95 

0.92 

0.66 

0.93 

Cod-rna 

10% 

0.95 

0.96 

0.95 

0.96 

0.95 

0.91 

0.66 

0.92 

20% 

0.95 

0.96 

0.95 

0.95 

0.94 

0.91 

0.67 

0.92 


40% 

0.95 

0.95 

0.95 

0.95 

0.93 

0.90 

0.68 

0.91 


5% 

1.00 

0.99 

0.98 

1.00 

0.83 

0.92 

0.79 

0.89 

Clean 

10% 

0.99 

1.00 

0.99 

1.00 

0.83 

0.91 

0.79 

0.89 

20% 

1.00 

1.00 

1.00 

1.00 

0.83 

0.91 

0.79 

0.89 


40% 

1.00 

1.00 

1.00 

1.00 

0.82 

0.92 

0.79 

0.89 


5% 

0.87 

0.87 

0.87 

0.87 

0.92 

0.81 

0.60 

0.82 

Advertisement 

10% 

0.87 

0.87 

0.86 

0.86 

0.86 

0.85 

0.62 

0.82 

20% 

0.83 

0.85 

0.83 

0.85 

0.89 

0.83 

0.61 

0.83 


40% 

0.84 

0.86 

0.87 

0.81 

0.91 

0.85 

0.62 

0.82 


5% 

0.99 

0.99 

1.00 

1.00 

1.00 

1.00 

0.00 

1.00 

Nursery 

10% 

0.99 

0.99 

1.00 

1.00 

1.00 

1.00 

0.00 

1.00 

20% 

0.96 

0.96 

1.00 

1.00 

1.00 

1.00 

0.00 

1.00 


40% 

0.92 

0.92 

1.00 

1.00 

1.00 

0.99 

0.46 

1.00 


5% 

0.83 

0.87 

0.81 

0.87 

0.96 

0.76 

0.97 

0.88 

Hypothyroid 

10% 

0.85 

0.86 

0.78 

0.86 

0.96 

0.76 

0.96 

0.89 

20% 

0.84 

0.86 

0.72 

0.86 

0.96 

0.75 

0.97 

0.90 


40% 

0.86 

0.88 

0.84 

0.88 

0.96 

0.76 

0.97 

0.89 


5% 

0.94 

0.94 

0.94 

0.94 

0.94 

0.93 

0.89 

0.94 

Buzz 

10% 

0.94 

0.94 

0.94 

0.94 

0.94 

0.93 

0.89 

0.94 

20% 

0.92 

0.94 

0.93 

0.94 

0.94 

0.93 

0.88 

0.93 


40% 

0.93 

0.93 

0.93 

0.93 

0.94 

0.94 

0.86 

0.94 


Table 4: 

Computational Time ( sec. 

) 


MLSVM 

SVM 

MLWSVM 

WSVM 

Twonorm 

6 

29 

6 

29 

Letter 

37 

145 

39 

146 

Ringnorm 

5 

26 

5 

27 

Cod-rna 

300 

1865 

315 

1891 

Ciean 

25 

103 

23 

90 

Advertiesment 

99 

228 

101 

232 

Nursery 

26 

188 

32 

193 

Hypothyroid 

2 

3 

2 

3 

Buzz 

3915 

26963 

4705 

27732 
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Table 5: Accuracy of financial risk problem with five risk classes (Example 1) 


Class 

1 

2 

3 

4 

5 

LR 

0.58 

0.54 

0.53 

0.51 

0.59 

MLSVM 

0.73 

0.50 

0.44 

0.50 

0.71 


Table 6: Comparison of Multilevel WSVM against Multilevel SVM and Adaptive Logistic 
Regression (LR). Improved results are in bold. 



G-mean 

SN 

SP 

ACC 

Adaptive LR 

0.7516 

0.8903 

0.6345 

0.7619 

MLSVM 

0.8012 

0.9750 

0.6583 

0.8496 

MLWSVM 

0.8016 

0.9739 

0.6598 

0.8495 
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